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Abstract: I will show that there is a deep relation between error-correction codes and certain mathematical models 
of spin glasses. In particular minimum error probability decoding is equivalent to finding the ground state of the 
corresponding spin system. The most probable value of a symbol is related to the magnetization at a different 
temperature. Convolutional codes correspond to one-dimensional spin systems and Viterbi's decoding algorithm 
to the transfer matrix algorithm of Statistical Mechanics. A particular spin-glass model, which is exactly soluble, 
corresponds to an ideal code, i.e. a code which allows error-free communication if the rate is below channel capacity. 

The mathematical theory of communication[l, 2] is probabilistic in nature. Both the production 
of information and its transmission are considered as probabilistic events. A source is producing 
information messages according to a certain probability distribution. Each message consists of a 
sequence of N bits a — {ci,- • - ,cr/v}, <7j = ±1 and it is assumed that the probability P s {<j) = 
exp — H s (a) of any particular sequence a is known. According to Shannon the information content 
of the message is — In P s (a) and the average information of the source is given by 



^P s (a) lnP s (a) 



The messages are sent through a transmission channel. In general there is noise during transmission 
(which may have different origins) which corrupts the transmitted message. If a a = ±1 is sent 
through the transmission channel, because of the noise, the output will be a real number u, in general 
different from a. Again, the statistical properties of the transmission channel are supposed to be 
known. Let us call Q(u\a)du the probability for the transmission channel's output to be between it 
and u + du, when the input was a. Q(u\a) is supposed to be known. Because of the noise during 
the transmission, there is a loss of information. The channel capacity C is defined as the maximum 
information per unit time which can be transmitted through the channel. The maximum is taken 
over all possible sources. 

Thanks to Shannon's "source coding theorem" , it is always possible to encode the source in a 
way such that all sequences become equally probable (H s (a) — const., non depending on the cr's). 
Source encoding reduces the redundancy in the source messages (not to be confused with "channel 
encoding", see later). 

For reasons of simplicity, we will assume in the following that the source has been encoded and 
that the noise is independent for any pair of bits ("memoryless channel"), i.e. 



Q{u\a) = Y\_Q(Ui\ai) 



In the case of a memoryless channel and a gaussian noise, Shannon calculated the channels capacity 

1 v 2 
C = -log 2 (l + — ) 

2 Z/T 
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where v 2 /w 2 is the the signal to noise. In the weak signal to noise limit C <~ v 2 /(2w 2 In 2) . 

Under the above assumptions, communication is a statistical inference problem. Given the 
transmission channel's output and the statistical properties of the source and of the channel, one 
has to infer what message was sent. In order to reduce communication errors, one may introduce 
(deterministic) redundancy into the message ( "channel encoding" ) and use this redundancy to infer 
the message sent through the channel ("decoding"). The algorithms which transform the source 
outputs to redundant messages are called error-correcting codes. More precisely, instead of sending 
the N original bits e^, one sends M bits J™, k = 1, ■ ■ ■ , M, M > N, constructed in the following 
way 

■»(fe) 



4" = C% ,a iki ...a ilk (1) 



Ik) (k) 
where the "connectivity" matrix C) ■ has elements zero or one. For any k, all the C„ 

except from one are equal to zero, i.e. the JJp are equal to ±1. i; defines the code, i.e. it 

tells from which of the cr's to construct the fcth bit of the code. This kind of codes is called parity 
checking codes because J % k n counts the parity of the minusis among the l k cr's. The ratio R = N/M 
which specifies the redudancy of the code, is called the rate of the code. 

We illustrate with a simple example of an R = 1/2 code. From the N oVs we construct the 2N 
4> in ,J*> in ,i,k = l,---,N. 



T l.in r 2An 



J k = Vk-lVk&k+l , J k =Vk-l&k+l 

Knowing the source probability, the noise probability, the code and the channel output, one 
has to infer the message that was sent. The quality of inference depends on the choice of the code. 

According to the famous Shannon's channel encoding theorem, there exist codes such that, in 
the limit of infinitly long messages, it is possible to communicate error- free, provided the rate of the 
code R is less than the channel capacity C. This theorem says that such "ideal" codes exist, but 
does not say how to construct them. 

We will now show that there exists a close mathematical relationship between error-correcting 
codes and theoretical models of disosdered systems[3, 4,5,6]. As we previously said, the output of the 
channel is a sequence of M real numbers J out = {J£ ut , k = 1, • • • , M}, which are random variables, 
obeying the probability distribution Q{J^ ut \Jl^)- Once the channel output J out is known, it is 
possible to compute the probability P(r| J° ut ) for any particular sequence f = {r,, i = 1, • • • , N} 
to be the source output (i.e. the information message). 

More precisely, the equivalence between spin-glass models and error correcting codes is based 
on the following property [5, 6]. 

The probability P{f\J out ) for any sequence f = {rj, i = 1,---,N} to be the information 
message, conditional on the channel output J out = {J% ut , k = 1, • • • , M} is given by 

M 

\nP(r\J mt ) = const - H s (f) +Y, C ^!-n k B k r ik , ■ ■ ■ n lk = -H t (r) (2) 

fe=i 

where 

B k = B k {jr) = \ ln 7 ggS^ (3) 



We recognize in this expression the Hamiltonian of a p-spin spin-glass Hamiltonian. The dis- 
tribution of the couplings is determined by the probability Q( J out \J m ). 
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The proof is the following. The probability P(f| J out ) for the source output to be f when the 
channel output is J out is, by Bayes formula, 

p {A Jon t) Ps(f)Q(J° Ut \J m ) 



Ef P s(r)Q(J ou V in ) 

where J™ = C[ k ^ i r ik ■••Ti lk . Because the channel is memory less and J£™ = ±1, 



M 



\nP(f\J out ) = const. + J2 ln Q( J k Ut \Jk n ) +lnP s (r) 

fe=i 

1 T in 0( l out 1 1 "l 

\nQ(jrvn = 7 :in(Q(jr t ii) Q(j° k ut \ - 1) ) + m- k ' 



2 v~*v * , / * , , , ■ 2 Q(./out|_l) 

where const, means independent of J rn . To complete the proof, one has to substitute the J rn, s 
according their definition as a product of the r's. 

"Minimum error probability decoding" (or MED, see later), which is widely used in communi- 
cations, consists in choosing the most probable sequence r°. This is equivalent to finding the ground 
state of the above spin-glass Hamiltonian. 

In the case when Q{J out \J in ) = Q(-J out \ - J in ) (the case of a "symmetric channel"), 
Bk(J£ ut ) = ~Bk(—J%. ut ) and one recovers the invariance of the spin-glass Hamiltonian under gauge 
transformations Tj — ► eiTi, Bk — > Bk e% kx ■ ■ ■ £i ifc , t% = ±1. 

When all messages are equally probable and the transmission channel is memoryless and sym- 
metric, the error probability is the same for all input sequences. It is enough to compute it in the case 
where all input bits are equal to one. In this case, the error probability per bit P e is P e = 1 ~™ , 

where = jj J2iLi T i d ^ an< ^ T i d ^ ^ s ^ ne symbol sequence produced by the decoding procedure. 

Let us give a couple of examples of symmetric channels. The first is the case of Gaussian noise 
(the "Gaussian channel" ) . 

f Tout jin\2 rout 

Q(J^\J m ) = cc^- [ > , B k = (4) 

The other example is when the output is again ±1 (the "binary symmetric channel" ) 

0(j°«V m ) = (1-p) <*/.«*, j«» +pJ J o» s _ J i» 

B k = fel fclzf + fci ln ^_ = £ ln I^ (5 ) 
2 p 2 1 -p 2 p w 

(the last equality holds because in this case J£ ut = ±1). 

Instead of considering the most probable instance, one may only be interested in the most 
probable value rf of the "bit" n[7, 8,9]. Because n = ±1, the probability Pi for rf = 1 is simply 
related to m„ the average of rf, Pi = (1 + n%i)/2. 

mi = \ X] n ex P~ H t(r) Z= ^2 exp-flt(f) rf = sign (mi) (6) 

{Tl'-Tjv} {ti-'-Tjv} 

In the previous equation m, is obviously the thermal average at temperature T = 1. It is amus- 
ing to notice that for the gaussian channel or the binary symmetric channel, T = 1 corresponds 
to Nishimori's temperaturc[10]. Another amusing observation is that the so-called convolutional 
codes which are extremely popular in communications, correspond to one dimensional spin-glasses. 
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Furthermore the decoding algorithm, which is called dynamical programming or "Viterbi decoding 
algorithm" , is nothing else than the transfer matrix algorithm of statistical mechanics. 

So the equivalence between parity checking error correcting codes and theoretical models of 
spin glasses is quite general and we have established the following dictionary of correspondence. 

Spin Hamiltonian 
J 2 /AJ 2 

Find a ground state 
Ground state magnetization 
magnetization at temperature T = 1 
One dimentional spin — glasses 
Transfer matrix algorithm 

This correspondence is not only an amusing mathematical curiosity but can also be made useful 
by using the tools of modern statistical mechanics and the theory of disordered systems. Given a 
code, one can compute the error probability per bit if he is able to calculate the magnetization of 
the corresponding spin-glass model. There are at least two cases where this can be done. 

a) Week noise limit. Imagine first the case of no noise. The minimal requirement for a good 
code is that the corresponding spin system has a unique ground state, well separated from the 
excited states by a finite energy gap. Consider next slowly switching on the noise. The energy 
levels become random variables whose probability distribution can eventually be computed. Error 
occurs when there is level crossing and a formerly excited state aquires a lower energy than the spin 
configuration which was the ground state in the absence of noise. The probability of this to happen 
may be computed in certain cases. 

b) Extensive connectivity. This case corresponds to a mean field limit. To be precise we consider 
the case of a gaussian symmetric channel (i.e. gaussian noise) and a code defined by the following 
connectivity matrix c[ k ^ { (see equation (1) ); Ik = p for all k and { = 1 for all possible 

ki ••• Z fc K\ • • • p 

p-spin multiplets. There are M = N\/(p\(N — p)\) such multiplets. Therefore the rate of the code 
is R = N/M = pl(N - p)\/{N - 1)!. We consider the limit N — > oo,p — > oo, p 2 /N — > and 
p/lniV — > oo. In this limit, the corresponding spin model is a slight generalization of Derrida's 
random energy model (REM) [11]. This is easily seing, if one considers the case of all input bits 
equal to one. (This is not a loss of generality because all input sequences are obviously equivalent 
when the noise is symmetrically distributed around zero.) Derrida considered the case of gaussian 
random couplings with zero average and standard deviation A J 2 = W 2 . The only difference with 
the present case is that the coupling average is J = V where V 2 is the signal power which is non 
zero. (In fact it can be shown that a Gaussian noise is not required. Only the first two moments 
of the noise distribution are relevant, i.e. the computation is valid not only for Gaussian noise but 
also for more general symmetric noise distributions.) 

For the spin model to have a nontrivial thermodynamic limit, we consider the case of a signal 
power such that V — vp\/N p ~ 1 and a noise power W 2 = w 2 p\/N p ~ 1 , p and N — > oo, while v and w 
are kept fixed. The signal to noise power ratio is then V 2 /W 2 = v 2 p\/w 2 N p ~ 1 Using arguments a 
la Derrida or "replica" calculations, (neither of these arguments is rigorous) it can be shown that, 
in this model, the ground state magnetization is m = 1 for v 2 /w 2 > 2 In 2 and zero otherwise. 
As we saw above, m = 1 means zero error probability per bit for the corresponding code. The 
above inequality v 2 /w 2 > 2 In 2 is equivalent to R < C. In other words, the error-correcting code, 
corresponding to the the random energy model, is an ideal code, i.e. allows error-free communication 
if R < C . One may wonder how fast this code approaches the asymptotic regime. It turns out that 
it is possible to compute the asymptotic expansion of m as p — > oo. This is done by using the 



Error — correcting code 

Signal to noise <f=^- 
Maximum likelihood Decoding 
Error probability per bit 
Sequence of most probable symbols 
Convolutional Codes 
Viterbi decoding 
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"replica method". The result of this computation is that for v 2 /w 2 <~ 2 In 2, 



m = 1 - eXp( ~ P ; 2/w2) C (,> 2 ) c(21n2) = .987 

VP 

To the best of my knowledge, the only other explicitly known ideal codes are pulse position 
modulation (or ppm) codes. They can briefly be described as follows. During a time interval T, one 
can transmit one of N possible symbols. T is divided into N subintervals of duration S = T/N. To 
send the i'th symbol, one sends during the i'th time subinterval an electric pulse of duration 5 and 
amplitude h. 5 and h have to be chosen depending on the noise power and the desired reliability. 
It can be shown that in the limit h — > oo and 6 — > 0, this code is ideal. Both the REM and ppm 
codes become ideal in the limit of infinite redundancy and zero signal to noise power. 

Up to now we only considered parity checking codes, for which the "alphabet" has length 2, i.e. 
there are only two symbols, a = 1 and a = — 1 Let mc finally mention that many of the previous 
results can be generalized [5, 6] to the case of an alphabet of length I. One may establish a one to one 
correspondence between the I symbols of the alphabet and the elements of a finite group with the 
same number of elements. Spin multiplication is replaced by group multiplication. These codes can 
be seeing as an interpolation between parity checking codes and pulse position modulation codes. 
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